Novel ribotype/sequence type associations and diverse CRISPR-Cas systems in environmental Clostridioides difficile strains from northern Iraq

Abstract The environment is a natural reservoir of Clostridioides difficile, and here, we aimed to isolate the pathogen from seven locations in northern Iraq. Four of the sites yielded thirty-one isolates (ten from soils, twenty-one from sediments), which together represent ribotypes (RTs) 001 (five), 010 (five), 011 (two), 035 (two), 091 (eight), and 604 (nine). Twenty-five of the isolates (∼81%) are non-toxigenic, while six (∼19%) encode the toxin A and B genes. The genomes of eleven selected isolates represent six sequence types (STs): ST-3 (two), ST-15 (one), ST-107 (five), ST-137 (one), ST-177 (one), and ST-181 (one). Five novel RT/ST associations: RT011/ST-137, RT035/ST-107, RT091/ST-107, RT604/ST-177, and RT604/ST-181 were identified, and the first three are linked to RTs previously uncharacterized by multilocus sequence typing (MLST). Nine of the genomes belong to Clade 1, and two are closely related to the cryptic C-I clade. Diverse multiple prophages and CRISPR-Cas systems (class 1 subtype I-B1 and class 2 type V CRISPR-Cas systems) with spacers identical to other C. difficile phages and plasmids were detected in the genomes. Our data show the broader diversity that exists within environmental C. difficile strains from a much less studied location and their potential role in the evolution and emergence of new strains.


Introduction
The nosocomial Clostridioides difficile infection (CDI) is c har acterised by antibiotic induced diarrhoea and pseudomembranous colitis (Czepiel et al. 2019 ).The identification of clinical C. difficile ribotypes (RTs) in environmental settings indicates putative connections with humans and animals and could contribute to the emergence of new strains in hospital and thus pose a significant health risk (Janezic et al. 2016, Czepiel et al. 2019, Williamson et al. 2022 ).Envir onmental C. difficile str ains that ar e geneticall y related to those isolated from human in clinical settings suggest that the same strains can inhabit multiple niches and the environment is a reservoir of CDI (Knight et al. 2017, Rodriguez Diaz et al. 2018, Janezic et al. 2020, Lim et al. 2020, Williamson et al. 2022 ).Cor e genome single nucleotide v ariant (SNV) anal ysis r e v ealed that 42% of human strains sho w ed clonal relationship (separated by ≤2 SNVs in their core genome) with one or more str ains fr om envir onmental samples (Knight et al. 2017, Janezic et al. 2020 ).This str ongl y supports a persistent comm unity r eservoir with long-range dissemination.Since the sources/reservoirs outside the hospital setting play a significant role in the transmission of CDI, continuing molecular and genomic surveillance of str ains fr om these sources is vital to find opportunities to reduce the ov er all CDI burden (Knight et al. 2017, Lim et al. 2020 ).
Clostridioides difficile diversity is mainly characterized using PCR ribotyping, which distinguishes the strains based on the size and copy number of the 16S-23S rRNA intergenic spacer region (Indra et al. 2008 , Chatterjee andRaval 2019 ).Polymerase chain reaction (PCR) ribotyping is clearly useful for outbreak investigations (Seth-Smith et al. 2021 ) and has r elativ el y equal discriminatory po w er to multilocus sequence typing (MLST), which identifies C. difficile strains based on the combinations of se v en unique housek ee ping genes that allow designation of allele profiles or sequence types (ST) to r epr esent a genotype (Griffiths et al. 2010, Knight et al. 2017, Janezic and Rupnik 2019 ).Whole genome sequencing (WGS), ho w e v er, permits single nucleotide-le v el str ain r esolution over all genomic space, thus, it is essential for long-term epidemiological, evolution, and population dynamics studies (Dominguez et al. 2016, Dingle et al. 2017, Muñoz et al. 2017, Uelze et al. 2020 ).WGS is curr entl y accessible due to the low sequencing cost and availability of publicly available genome data, which provide v aluable r esources for mor e in-depth genome comparisons than ribotyping.
Clostridioides difficile surveillance is more effective in western countries but v ery fe w epidemiological studies ar e r eported in northern Ir aq, whic h leav es a significant geogr a phic lac k of awar eness of this bacterium in this part of the world.We have reported the genomes of three novel Clostridium sp.strains isolated from the environment in northern Iraq (Rashid et al. 2016 ), but, to date, no other environmental C. difficile genomes from this region have been reported.This highlights the paucity of knowledge that exists on str ains suc h as RTs 001, 010, 011, and 035 that are circulating in the environment in this part of the globe and their potential role in clinical settings (Har gr eav es et al. 2013, Har gr eav es et al. 2016, Janezic et al. 2016 ).To further knowledge in this area and strengthen the existence of clinically relevant C. difficile strains in the natural en vironments , here , we isolated and geneticall y c har acterized envir onmental isolates fr om northern Ir aq.We included strains from our previous studies to conduct wholegenome analyses to ascertain their RT/strain type relationships.Furthermor e, we anal ysed the div erse CRISPR-Cas systems found within the strains and compared these features to strains from other regions to better ascertain possible genetic interactions that occurr ed thr ough horizontal gene tr ansfer via pr opha ge elements and their role in C. difficile evolution.

Sampling sites
To isolate C. difficile from northern Iraq, soil (seven) and sediment (five) samples were collected from seven sites: Hamamok, Dokan, Jalee, Chnarok, T aq T aq rivers, and Safeen and Haibat Sultan mountains between 2012 and 2013 ( Supplementary Table S1 ).Samples were collected into scr e wed-ca pped, sterile falcon tubes, immediatel y stor ed at 4 • C, and pr ocessed within 2 weeks of collection.

Recovery of C. difficile isolates from environmental samples
Clostridium difficile was isolated using pr e viousl y described enric hment pr ocedur es (Har gr eav es et al. 2013 ).Briefly, ∼1 g of soil/sediment was mixed with 10 mL of fastidious anaerobic broth supplemented with 250 μg mL −l cycloserine and 8 μg mL −1 cefoxitin (Bioconnections , Leeds , UK) to select for C. difficile .Also, 0.1% sodium taur oc holate (Sigma-Aldric h, Dorset, UK) was added to the enrichment to enhance spore germination (Foster and Riley 2012 ).The cultures were incubated for 10 days in a MiniMACS anaer obic c hamber (Don Whitley Scientific, West Yorkshir e, UK; 10% H 2 , 5% CO 2 , and 85% N 2 ) at 37 • C, then centrifuged for 10 min at 5000 × g .To further select for C. difficile spores and reduce other bacterial contaminants, the pellet was treated with an equal volume of industrial methylated spirit and incubated for 30 min at r oom temper atur e.A loopful of the mixtur e was spr ead on Br azier's cycloserine, cefoxitin, and egg yolk (CCEY) selectiv e a gar plates and incubated anaer obicall y for 48 hours.Clostridioides difficile colonies were purified through three further rounds of subculturing on Brain Heart Infusion (BHI) agar (Oxoid, Ltd., UK) supplemented with 7% defibrinated horse blood (TCS Biosciences, Ltd., UK).The pr esumptiv e colonies wer e identified by the c har acteristic horse manure smell, colony morphology, and y ello w-green fluorescence under the long-wa ve ultra violet light (Delmée 2001 ).The isolates were confirmed by PCR targeting the C. difficile 16S rRN A gene, as described b y Rinttilä et al. ( 2004 ).Bacterial isolates wer e stor ed in Pr otect bacterial pr eserv ers (Tec hnical Service Consultants, Ltd., Heywood, UK) at −80 • C.

Ribotyping and toxin gene char acteriza tion C. difficile isolates
To determine if multiple R Ts w ere found in each site or sample, ten r andoml y selected bacterial isolates from each sample were subjected to conventional and capillary PCR ribotyping targeting the intergenic spacer 16S-23S rRNA genes using primers GTGCGGCTGGATCACCTCCT-3 and 5 -CCCTGCACCCTT AA T AACTTGACC-3 (Indra et al. 2008 ).DN A w as extr acted fr om br oth cultur es that wer e gr own for 18-24 hours anaer obicall y using 5% Chelex ® (BioRad Laboratories, California, USA).The PCR ribotyping conditions wer e denatur ation at 95 • C for 120 s, follo w ed b y 30 c ycles of denaturation at 92 • C for 60 s, annealing at 55 • C for 60 s, elongation at 72 • C for 90 s, and a final extension at 72 • C for 5 min.PCR products alongside a 100-bp DNA ladder (F ermentas , York, UK) wer e r esolv ed at 3%.Response r egular a gar ose gel (Geneflo w, Staffor dshir e, UK) pr epar ed in 1 × Tris-acetate-EDTA buffer and stained with GelRed (Biotium, Hayw ar d, California, USA; Nale et al. 2012 ).Ima ges wer e visualized using the SynGene application in a UV transilluminator.Fr a gments fr om ca pillary ribotyping wer e anal ysed using Peak Scanner software v1.0 (Applied Biosystems , UK).T he similarity of the strains was assessed using a MultiVariate Statistical Package (MVSP, K o v ac h Computing Services, Anglesey, UK) based on the presence of amplicons of a particular size.Sorensen's distance w as calculated betw een each combination of isolates and clustered ( Supplementary Fig. S1 and B; Shan et al. 2012, Nale et al. 2016 ).Ele v en str ains fr om the sites with six distinct patterns of amplicons were submitted to Leeds Reference Laboratory, UK for further confirmation of RT designation and toxin genes presence.All the isolates were further screened for the presence of the toxin genes using multiplex PCR with primer pairs NK2 and NK3, which amplified the partial sequences of tcd A, NK9, and NK11 targeting the essential repeat region within the tcd A (Kato et al. 1998 ) and NK-104 and NK-105 for the toxin B gene (Barroso et al. 1990 ). Amplification conditions for the multiplex PCR for the NK primers were initial denaturation at 95 • C for 5 min, follo w ed b y 32 cycles of denaturation at 95 • C for 20 s, annealing at 62 • C for 120 s, elongation at 72 • C for 2 min, and a final extension step at 72 • C for 5 min.Binary toxin genes cdt A and cdt B presence were determined using primers and pr ocedur es pr e viousl y described (Barroso et al. 1990, Stubbs et al. 2000 ).PCR reaction conditions were initial denaturation at 95 • C for 5 min, follo w ed b y 30 c ycles of denaturation at 94 • C for 45 s, annealing at 52 • C for 60 s, elongation at 72 • C for 2 min, and a final extension stage for 5 min at 72 • C. PCR amplicons were resolved in a 1% molecular-grade a gar ose gel (Bioline, UK) in 1xTAE with GelRed and visualized as described abo ve .

Whole-genome sequencing
To further assess the genome diversity within the isolates, a total of ele v en isolates comprising of three isolates of R T091, tw o isolates each of R T001, R T035, and R T604, and one isolate each of RT010 and RT011 were sequenced using the Illumina MiSeq (2 × 250 bp paired end) platform following Nexter aXT libr ary pr eparation.The genomic DN A w as prepared from broth cultures that wer e gr own for 18-24 hours anaer obicall y in BHI br oth (Oxoid, Hampshire, UK) using a QIAGEN Genomic Kit according to manufactur er's instructions.Appr oximatel y 1 ng of DN A w as used in the Nextera XT DNA sample preparation (Illumina, San Diego, California, USA), following the manufacturer's instructions.Libr aries wer e sequenced using a MiSeq V2 r ea gent kit (2 × 250 bp).Genomes were assembled using SPAdes 2.0 with the following parameters: '-k 21, 33,55,77,99127 -careful'.All genomes were submitted to the European Bioinformatic Institute (EBI) and Enterobase under the project accession PRJEB8702.The genome can be accessed online at: https:// www.ebi.ac.uk/ ena/ browser/ view/ PRJ EB8702 .Contigs w ere or dered against the reference strain C. difficile CD630 (NC_009089) using MAUVE v2.3.1 (Darling et al. 2004 ).Genomes were annotated using PROKKA v1.14.5 with the following settings: '-compliant -genus Clostridium use genus' (Seemann 2014 ).Clostridioides difficile isolates were sequencetyped as pr e viousl y described by Griffiths et al. ( 2010 ), utilizing se v en r egions within conserv ed the housek ee ping genes ( adk , atpA , dxr , glyA , recA , sodA , and tpi ).Alleles from the assembled genomes wer e extr acted and queried a gainst the cur ated C. difficile database ( https:// doi.org/ 10.12688/ wellcomeopenres.14826.1 ; Jolley et al. 2018 ).To ascertain the phylogenetic relationships between the new isolates and strains from different global locations and r ele v ant additional C. difficile str ains, we c hose the genomes of 78 C. difficile strains that are publicly available on Enterobase ( http: //enter obase.warwick.ac.uk ) and NCBI based on their strain types, diverse sources, and geographic locations ( Supplementary Table S2 ).A maximum likelihood tree was constructed using PhyML (Guindon et al. 2010 ) as described pr e viousl y (Didelot and Wilson 2015 ).Recombination was accounted for using ClonalFrameML (Didelot and Wilson 2015 ).The tree was visualized in iTOL software v6.4.3 (Letunic and Bork 2019 ).

CRISPR arrays prediction
To establish the diversity of the CRISPR-Cas system within the genomes of the isolates , arra y prediction was conducted using PILERC-CR 1.06 with default settings (Edgar 2007, Ekseth et al. 2013 ).Dir ect r epeat (DR) sequences wer e aligned in the Clustal Omega (Sie v ers et al. 2011 ) to establish consensus sequences and viewed with Jalview v2 (Waterhouse et al. 2009 ).The webserver PADLOC was used to determine the CRISPR-Cas system types within the genomes of the isolates based on profile Hidden Markov Models (Payne et al. 2021 ).Identified spacers w ere sear ched against Genbank and NCBI nucleotide BLAST and RefSeq-Plasmid databases to identify a possible extr ac hr omosomal origin using the CRISPRTarget tool (Biswas et al. 2013 ).The default values used by NCBI BLASTn for short sequences, < 30 bases (defaults for long sequences are in brackets) are: gap open −5( −5), ga p extend −2( −2), matc h + 1( + 1), mismatc h −10( −10), minim um score 30 (Biswas et al. 2013 ).

Clostridioides difficile was isolated from four of the seven sampling sites
Of the se v en sites sampled, onl y four (Dokan, Jalee, Hamamok, and Chnarok) yielded C. difficile , of which 31 isolates were recov er ed fr om these samples (T able 1 , Supplementary T able S1 , Fig. S1 ).We did not isolate C. difficile from T aq T aq river (one soil and one sediment samples), Safien mountain (two soil samples), and Haibat sultan mountain (one soil sample) despite sampling Safien mountain twice in the summer and winter of 2012 and 2013, r espectiv el y.

Di v ersity of the isolates based on ribotypes and toxin genes carriage
Six R Ts: R T001 (five isolates), R T010 (five isolates), R T011 (tw o isolates), R T035 (tw o isolates), R T091 (eight isolates), and R T604 (nine isolates) were identified.Although the study only examined a small number of isolates, diverse RTs both the sites and within specific samples were observed (Table 1 , Supplementary Table S1 , Fig. S1 ).Characterizing the isolates based on the presence or absence of C. difficile toxin genes sho w ed that of the 31 isolates, 25 (81%) were negative for both tcd A and tcd B (A ¯B ¯) genes, while the remaining six (19%) isolates encode the toxigenic (A + B + ) genes .T he RT011 isolates from Dokan (F9) and Jalee (CD105KSE6) had contrasting toxin profiles with the latter being toxin negative, while the F9 strain encodes both toxin genes tcd A and tcd B (Table 1 ).Furthermore, all the strains were binary to xin-negati ve (CDT ¯; Table 1 ).PCR amplicons were sequenced and shown to match the genome data.

Di v erse MLST profiles exist among the strains
To gain a detailed understanding of the genome c har acteristics of the r epr esentativ e RT isolates, a total of ele v en isolates r epresenting RT091 (three isolates), R T001, R T035, R T604 (tw o isolates each), and one isolate each from RT010 and RT011 were sequenced.The assembled genomes ranged from 49 to 458 contigs and 2594-2823 open reading frames per genome (as of February 2023; Supplementary Table S3 ), and their completeness and contamination le v el ar e shown in Supplementary Table S4 .

Phylogenetic relationships based on core genome
We explored the phylogenetic relationships and diversity of our ele v en sequenced str ains in the context of other 78 publicl y av ailable C. difficile genomes from diverse geographical regions comprising of 28 RTs and 29 strain types of the known eight clades (1, 2, 3, 4, 5, C-I, C-II, and C-III; Supplementary Table S2 ).Phylogenetic analysis based on whole-genome alignment r e v ealed eight discrete (Clades 1, 2, 3, 4, and 5) and the three previously observed deepl y br anc hing clades (Clades C-I, C-II, and C-III; Squire et al. 2015, Ramír ez-Var gas et al. 2018, Knight et al. 2021 ; Fig. 1 ).
Consistent with other findings, Clade 1 is the most diverse comprising of se v enteen RTs , sixteen STs , and includes toxigenic and non-toxigenic isolates ( Supplementary Table S2 ; Janezic andRupnik 2015 , Janezic et al. 2016 ).Nine of the isolates characterized in this study (CD105KSE1, CD105KSE2, CD105KSE3, CD105KSE4, CD105KSE5, CD105KSE6, CD105KSE9, CD105KSO10, and CD105KSE11) belonged to Clade 1. Clostridioides difficile strains CD105KSE3, CD105KSE4, CD105KSE5, and CD105KSE11 The RT designation was ascertained using capillary ribotyping targeting the 16S-23S rRNA intergenic spacer.Toxin profiles were determined using PCR to amplify the partial and essential repeat regions of toxin A genes, toxin B, and binary toxin genes.closely cluster with the other strains of the same R T. Ho we v er, CD105KSE6 (RT035) clusters distantly from strains of the similar RT.The Clade C-I, reclassified as a novel independent Clostridioides genomospecies with C-II and C-II clades comprised of RT206, R T289, R T290, R T127, and R T604 isolates, in addition to six other isolates .T he RT604 isolates from this study (CD105KSO7 and CD105KSO8) ar e ne w additions to the C-I Clade and the only str ains fr om envir onmental source with the r est being of clinical origin within this clade (Fig. 1 ).

Multiple pr opha ge carria ge detected in en vironmental str ains of C. difficile
We explored the genomes of the isolates and multiple intact and partial pr opha ges wer e detected within the genomes of the strains ( Supplementary Table S5 ).The size of the intact pr opha ges r anged fr om 20.6 to 137.9 kb, while the incomplete pr opha ges r anged fr om 6.8 to 62.1 kb ( Supplementary Table S5 ).Two intact pr opha ges wer e identified in CD105KSE1, CD105KSE2, and CD105KSO10, while three intact prophages were found in CD105KSE3, CD105KSE4, CD105KSE5, CD105KSO7, and CD105KSE11.Strains CD105KSE9, CD105KSO8, and CD105KSSE6 had five, four, and one intact prophages predicted in their genomes, r espectiv el y.Further anal ysis of all the pr edicted r egions of the intact pr opha ges in the isolates using BLAST sho w ed similarity to other C. difficile phages ( Supplementary Table S5 ).

CRISPR-Cas system di v ersity in environmental C. difficile strains
The genomes of the ele v en str ains wer e also scr eened for the presence of CRISPR-Cas systems and found to encode multiple CRISPR arra ys , ranging from three to twelve per genome (Fig. 2 ), with a variable number of DR (average length ∼29 bp) separated by variable spacer contents, ranging from 45 to 112 per strain.A total of 97 DRs were extracted from the CRISPR arrays of the 11 strains, and 22 different DR consensus sequences were identified.Of these, six consensus DRs were unique ( Supplementary Fig. S2 ).SNPs and identical DR sequences are also observed within the arrays of multiple strains ( Supplementary Fig. S2 ).The observed v ariation is possibl y expected assuming how widespread the system is (Rath et al. 2015 ).CRISPR-Cas systems were defined within the genomes of C. difficile strains, two known classes of CRISPR-Cas systems (class 1 subtype I-B1 and class 2 type V CRISPR-Cas systems), and a CRISPR-Cas type that has two genes homologous to cas6b and cas8b only were identified (Fig. 2 (Schunder et al. 2013, Vestergaard et al. 2014 ), and also known as genome editing system that comprises of crRNA and Cas 12a protein (Liu et al. 2020 ).No cas genes were identified in the genome of RT604 strains CD105KSO7 and CD105KSO8.This lack of cas genes could be due to the deletion through horizontal gene transfer resulting in se v er al independent deletions of the complete set of cas genes as shown in enterococcal strains (Palmer and Gilmore 2010 ).
In the subtype I-B1 CRISPR-Cas system, the two mainly conserved clusters of cas genes were identified, consistent with an earlier report (Andersen et al. 2016 ).The first cas gene cluster, termed Class 1 subtype I-B1 Cas system identified in 81.8% of the genomes code for the two mainly conserved clusters of cas genes that identified in this type.Cluster cas A that encodes partial cas gene set ( cas b 6 , casb8 , casb 7, casb 5, and casb 3), and cluster cas B codes for a complete set of subtype I-B1 cas genes ( cas b 6 , casb8 , casb 7, casb 5, and casb 3) and ( casb 1, casb 2, and casb 4).Class 2 V-type Cas system founded in 27.3% of the genomes encode for a single large effector protein ( cas 12f).Cas type other founded in only 9% of the genomes has only two genes homologous to cas6b and cas8b .Homologous cas genes are shown with colour ed arr ows.Colour coding is the same for homologous cas genes.
cas A, encodes a partial cas gene set ( cas b 6 , casb8 , casb 7, casb 5, and casb 3) lacking casb 1, casb 2, and casb 4 and was identified in 81.8% of the genomes .T he second cas gene cluster is cas B, which encodes a complete set of subtype I-B1 cas genes ( cas b 6 , casb8 , casb 7, casb 5, and casb 3) as well as ( casb 4, casb 1, and casb 2) identified in 27.3% of sequenced strains (Andersen et al. 2016, Maikova et al. 2018 ; Fig. 2 ).Cas operons incidence was found to be associated with the RT pr ofiles; for example, str ains of R T091, R T035 and R T010 have similar cas gene clusters, cas A, that encode a partial cas gene set (Fig. 2 ).Class 2 V-type Cas system with a single large effector protein with 1380 amino acid lengths (Cas12f) has been found in 27.3% of the studied genomes (Pyzocha andChen 2018 , Xiao et al. 2020 ).Div ersity was observ ed within the str ains based on the m ultiple CRISPR-Cas types; 36.4% of the strains encode two different types of CRISPR-Cas systems within a single genome.For example, both strains of RT001 and strain of RT010 have class 1 subtype I-B1 and class 2 type V CRISPR-Cas systems.Inter estingl y, the RT091 strain, CD105KSE1, encodes subtype-I-B1 with a casA gene cluster and another CRISPR-Cas type with an unknown Cas-type that has only two cas genes encoded for casb 6 and casb 8 (Fig. 2 ).

CRISPR spacers homology among the C. difficile strains
To determine if the CRISPR-Cas systems of the 11 c har acterized strains could target known phages, the spacers of the arrays within the genomes of the strains were searched against Genbank and BLAST nucleotide databases and RefSeq-Plasmid databases using the CRISPRTarget tool (Biswas et al. 2013 ).In total, 1054 spacers were identified from the genome of our str ains, of whic h 185 were identical to other published C. difficile phages and plasmid sequences from a diverse range of geographical locations, and 869 spacers wer e nov el ( Supplementary Table S6 ).Fr om the 185 identical spacers, 118 spacers were identical to other published C. difficile phages, 67 spacers identical to plasmid sequences (as of March 2023; Fig. 3 , Supplementary Table S6 ).Strains of R T001, R T091, and RT604 share a similar spacer sequence identity consistent with their e volutionary r elationships (Boudry et al. 2015 ).Both RT604 isolates have the lo w est number of spacers (45 spacers), and both str ains hav e thr ee CRISPR arr ays with the same number of spacers in each array.Ho w ever, only tw o spacers from the arrays of both strains have sequences similar to other published C. difficile phages .T his suggests that the majority of the spacers might be deriv ed fr om unknown pha ges that hav e yet to be isolated or c har acterized.Spacer numbers 13 and 39 in CD105KSO7 have sequences similar to spacer numbers 40 and 22 of strain CD105KSO8, respectiv el y.We hav e observ ed conserv ed numbers of arrays and spacers among the thr ee str ains of RT091, but strain CD105KSE2 has only one extr a arr ay with four spacers (Fig. 3 , Supplementary Table S6 ).Whilst RT035 str ains shar ed some common spacers, CD105KSE5 lacked the spacer for phiCD146, suggesting that dynamic changes in the CRISPR array content had occurr ed, possibl y thr ough inter actions with for eign DNA elements (Har gr eav es et al. 2014 ).It was observed that more than one CRISPR spacer within a strain from all RTs targeted the same phage.For example, two spacers (31, 73) from different CRISPR arrays from strain CD105KSE1 targeted phiCD27, signifying constant interactions between this strain and the corresponding phage (Boudry et al. 2015 ).Some of the strains carry multiple spacers for the same phage, such as CD105KSE3, which has spacer 12 and spacer 79 showing identical matches to phiCD146.Spacers for phiCDHM19 were only observed in two strains, CD105KSE5 and CD105KSE11, suggesting a less widespr ead pr edicted imm unity of this str ain to this phage (Mayer et al. 2008 S6 .( refers to the spacers that match to plasmid and C. difficile phage sequences, the numbers refer to the number of matched plasmid and C. difficile phage sequences, refers to the spacers that match to C. difficile phage sequences, the numbers refer to the number of matched phages, and refers to spacers that target multiple plasmid sequences, the numbers refer to the number of matched plasmids).White bo xes re present spacers that have no homology to any C. difficile phages.impl y pr obable hot spots of pha ge genome e volution loci in whic h bacterial strains are more exposed to these phages, which have counter-e volv ed thr ough infections.

Discussion
The paucity of information on environmental C. difficile strains in the Middle East compared to str ains fr om western countries reflects the lack of study in this ar ea.Typicall y, this pathogen has been considered to be a problem in the western world, and thus, it has not been a priority in other places.Ho w e v er, ne w geogr a phical ar eas suc h as Slov enia and Thailand ar e beginning to explor e this pathogen fr om differ ent envir onmental sources to understand the diversity that exists amongst the strains (Janezic et al. 2016, Putsathit et al. 2017, Imwattana et al. 2020, Tkalec et al. 2020 ).
The very limited studies on this pathogen in the Middle East have focused on characterizing isolates from clinical/hospitals sources (Khalil et al. 2019, Shoaei et al. 2019, Al-Tawfiq et al. 2020, Azimirad et al. 2020, Baghani et al. 2020, Williamson et al. 2022 ), raw meat (Esfandiari et al. 2014, Bakri 2018, Ersoz Seyma and Cosansu 2018 ), food products (Rahimi et al. 2015, Bakri 2016 ), and supermarket environments (Sadeghifard et al. 2010, Shoaei et al. 2019 ).In the Middle East region, the reported prevalence rates of CDI are 23.8% in Jordan, 8%-10% in Kuwait, and 5.15% in Saudi Arabia (Alzouby et al. 2020 ).Howe v er, ther e ar e no surv eillance strategies to show the occurrence of CDI in northern Iraq.The lack of information on the strains that are found in the region's natur al envir onment, or potentiall y tr ansmitted by human and animal activities may gr eatl y affect the control of this infection in this region and the world at large .Also, en vironmental C. difficile strains hav e been r eported to encode se v er al genetic elements that could contribute to the emergence of novel clinical strains in hospitals, as pr e viousl y r eported (Har gr eav es et al. 2015 ).
We pr e viousl y r eported the genome c har acteristic of thr ee novel species of Clostridia from the natural habitats of this region, in which all three isolates encode multiple prophage elements and the CRISPR Cas-system was found in two of the isolates (Rashid et al. 2016 ).Here, w e w ent further to isolate and c har acterize C. difficile isolates from river sediments and soils in northern Iraq for further work in this area.
In the current study, we isolated C. difficile from the sediment and soil samples of four of the se v en examined sites in northern Iraq.This indicates that these sources are important habitats from which to study C. difficile presence and diversity, which concurs with previous work conducted in our laboratory and else wher e (al Saif and Br azier 1996, Har gr eav es et al. 2013, Janezic et al. 2016, Rodriguez et al. 2019, Williamson et al. 2022 ).Again, consistent with pr e vious work, her e, the highest number of isolates that yielded C. difficile (21/31 isolates, ∼68%) wer e fr om sediment samples .T his ma y be attributed to the dormant spor es, whic h pr otect the bacteria and ther efor e may contribute to the transmission and persistence of C. difficile in the marine ecosystem (Zidaric et  Despite the small sample size, we had good r ecov ery r ates of C. difficile , which we isolated from four of the se v en examined sites ( ∼60%), suggesting it is abundant in the types of areas .T his rate of C. difficile isolation is comparable to a previous study, in which were 54% and 60% recovery were observed in two consecutive years (Har gr eav es et al. 2013 ).Ho w e v er, one other study was unable to detect C. difficile from sediment samples (Pasquale et al. 2011 ), and others detected only 24.0% from environmental samples (Janezic et al. 2016 ).The bacterial r ecov ery observ ed in our study may also be attributed to enric hment pr ocedur es carried out on the samples before isolation, which greatly enhanced the isolation of the bacterium (Har gr eav es et al. 2013 ).
Both envir onmentall y associated RTs (RT010, RT035, RT091, and RT604) isolates and those associated with an important clinical R T (R T001, R T011) w ere detected in our sample sites, which concurs with pr e vious studies (Har gr eav es et al. 2013, Har gr eav es et al. 2016, Janezic et al. 2016 ).Whilst R T001, R T010, and R T035 hav e pr e viousl y been isolated from the environments of Europe, all studies conducted in the Middle East have associated these RTs with clinical samples (Al-Tawfiq and Abed 2010, Har gr eav es et al. 2013, Al-Thani et al. 2014, Azimirad et al. 2020, Baghani et al. 2020 ).To our knowledge, this is the first time that these RTs have been found to be associated with the environmental sources in these parts of the country.This may be linked to human activities such as recreational activities in all the sample areas and agricultural runoff found in Dokan.In contrast, none of the environmental RT strains reported by other researchers from the Middle East were isolated in this study.This may be attributed to the fact that some of the pr e viousl y r eported str ains wer e not ribotyped, hence their identities are unknown (Jamal et al. 2002, Rotimi et al. 2003, Rahimi et al. 2015 ).In ad dition, pathogenic to xin strains that are associated with comm unity-acquir ed infections wer e isolated from retail surfaces, which enhanced the need to understand their medical impact and to enact any necessary preventativ e measur es (Alqumber 2014 ).We found uncommon R Ts, R T604 and RT091 from the West to be pr e v alent in the examined areas at the time of our sampling.In contr ast, RT010, whic h is common in both Europe and the United Kingdom, was found to be r ar e in the region examined (Rotimi et al. 2003, Al-Thani et al. 2014, Baghani et al. 2020 ).This suggests that certain C. difficile str ains ar e mor e pr e v alent in certain regions of the world than others are.
The isolation of both toxigenic and nontoxigenic isolates in this study is consistent with pr e vious studies (Rotimi et al. 2003, Hargr eav es et al. 2013, Janezic et al. 2016 ).The diverse toxin gene profiles observed within a R T sho w that the pathogenicity locus is variable and may not be a feature of clonality, and could be readily lost (Dingle et al. 2014 ).We did not isolate any binary toxinpositiv e envir onmental isolates.Ho w e v er, R T078 and R T027 isolates, which encode for a binary toxin, have previously been identified in environmental samples from England and Saudi Arabia and may be attributed to human or animal activities (Har gr eav es et al. 2013, Bakri 2016 ).
The isolation of six RTs that are associated with six MLST profiles also concurs with pr e vious studies that sho w ed that although MLST ar e normall y associated with a specific RT, but may not always predict the strain types and vice versa (Griffiths et al. 2010, Wang et al. 2018, Zhao et al. 2021 ).Multiple R Ts R T091 and R T0235 are associated with ST-107, and multiple STs ST-181 and ST-177 ar e r elated to RT604.The associations of RTs with multiple STs or vice versa hav e pr e viousl y been reported and may suggest the constant div er gent natur e of C. difficile genomes (Dingle et al. 2011, Stabler et al. 2012, Janezic and Rupnik 2015, Janezic et al. 2016, Knight et al. 2017 ).Phylogenetic analysis of these strains isolated in this study identified a lineage (Clade C-I) that is highly divergent from the other five established clades.In line with earlier studies, Clade 1 is diverse in term of RTs and STs, and comprised of both to xigenic and nonto xigenic strains (Stabler et al. 2012, Janezic and Rupnik 2015, Janezic et al. 2016, Ramír ez-Var gas et al. 2018 ).
The occurrence of multiple and diverse prophage carriage within C. difficile is high and has been pr e viousl y isolated fr om envir onmental str ains (Shan et al. 2012, Har gr eav es et al. 2013, Har gr eav es et al. 2015, Mullan y et al. 2015 ).Her e, we detected up to six intact pr opha ges in a single C. difficile genome, and this complex network of pr opha ges within envir onmental str ains could contribute to the evolution of new pathogenic strains.Further work is r equir ed to ascertain if all the six pr opha ges ar e inducible, as observed in previous work (Fortier andSekulovic 2013 , Hargreaves et al. 2015 ).
Evidence of the interplay between hosts and phages can be seen from the CRISPR arrays detected here .T he CRISPR-Cas system is a form of ada ptiv e imm unity that bacteria use to r esist pha ge infection (Har gr eav es et al. 2014, Maik ov a et al. 2018 ).Our r esults show that this system is diverse within our strains (Soutourina et al. 2013, Har gr eav es et al. 2014, Har gr eav es et al. 2016 ).Her e, w e sho w ed for the first time the pr e v alence of the class 2 type V CRISPR-Cas system in C. difficile strains .To date , class 1 subtype I-B is the native CRISPR-Cas system in C. difficile (Boudry et al. 2015, Maik ov a et al. 2018, Maik ov a et al. 2019 ).Both strains of RT001 possess a class 2 type V CRISPR-Cas system present within the mobile genetic region in both strains, this perhaps had been acquired via horizontal gene transfer as studies proposed that class 2 effectors originated from nuclease encoded by different mobile genetic elements (MGE; Koonin and Makar ov a 2019 ).We hav e also r eported the pr esence of two or mor e CRISPR-Cas types within the genome of a single strain; 27.3% of the sequenced strains carry class 2 type V CRISPR-Cas systems beside the native subtype I-B CRISPR-Cas systems (Fig. 2 ).Multiple CRISPR-Cas systems have been found in some organisms that occur naturally (Carte et al. 2014 ).Consistent with the earlier reports, both cas gene sets ( casA and casB ) of the I-B subtype were found within the sequenced strain (Boudry et al. 2015, Maikova et al. 2018 ).The occurrence of cas operons found to be associated with the RT profiles .T he variation of the CRISPR-Cas system types and the contents within RT strains could affect their susceptibility to infection by phages (Har gr eav es et al. 2014 ).The spacer contents of CRISPR arrays are identical to known phage sequences and are particularly insightful since it was pr e viousl y shown that 100% identity between spacer and proto-spacer sequences is r equir ed to pr ovide imm unity (Boudry et al. 2015, Maik ov a et al. 2018, Deem 2020 ).Although small numbers of mismatches could confer a degree of immunity during infection through target cleavage (Michael et al. 2022 ).Our data are in line with earlier studies and supports the potential role of phages to drive the evolution of epidemic strains (Hargreaves and Clokie 2014 ).

Conclusions
To conclude, C. difficile strains were found to be present in the natur al envir onment of northern Ir aq and wer e r eadil y isolated from 57% samples obtained.Genome analysis sho w ed that these str ains ar e div erse and distinct fr om those found else wher e, and as is seen in all C. difficile genomes, these strains had multiple pr opha ge carria ges with div erse CRISPR-Cas system types that hav e arr ays containing div erse spacers .We ha ve sho w ed for the first time instance of the class 2 type V CRISPR-Cas system in C. difficile strains that has been described in other bacterial genomes.
Although this was a small-scale study, the observations of RT and genome diversity in this region would provide an ov er all understanding of the diversity of this organism.Studies in new geogr a phies will further r e v eal insights into how this pathogen can e volv e and incr ease our understanding on the r elationship between strains observed in patients and those found in the environment.

Figure 1 .
Figure 1.Phylogenetic tree showing five clades (1, 2, 3, 4, and 5) and three cryptic clades (C-I, C-II, and C-III) of C. difficile isolates examined based on core genome comparison.Maximum likelihood tree was constructed based on the core genes of 11 strains examined in this study ( * ) and 78 other r efer ence C. difficile using PhyML as described pr e viousl y.Single nucleotide pol ymor phisms (SNPs) in core genes were utilized for the phylogeny, and recombination was accounted for with clonalframeML.Tree visualized in an iTOL software v6.4.3.
; Payne et al. 2021 ).Class 1 subtype I-B1 CRISPR-Cas system is described by multi-subunit protein effectors and was pr e viousl y observ ed in all queried genomes of C. difficile str ains (Har gr eav es et al. 2014 , Boudry et al. 2015 , Andersen et al. 2016 , Maik ov a et al. 2018 , Maik ov a et al. 2019 ).Class 2 type V CRISPR-Cas systems possess single, lar ge pr otein effectors (Makar ov a et al. 2015 ), observ ed in se v er al bacterial genomes

Figure 2 .
Figure2.Sc hematic dia gr am sho wing the CRISPR-Cas systems carried b y environmental C. difficile isolates examined in this study.Typical operon organization is shown for each CRISPR-Cas system.Class 1 subtype I-B1 Cas system identified in 81.8% of the genomes code for the two mainly conserved clusters of cas genes that identified in this type.Cluster cas A that encodes partial cas gene set ( cas b 6 , casb8 , casb 7, casb 5, and casb 3), and cluster cas B codes for a complete set of subtype I-B1 cas genes ( cas b 6 , casb8 , casb 7, casb 5, and casb 3) and ( casb 1, casb 2, and casb 4).Class 2 V-type Cas system founded in 27.3% of the genomes encode for a single large effector protein ( cas 12f).Cas type other founded in only 9% of the genomes has only two genes homologous to cas6b and cas8b .Homologous cas genes are shown with colour ed arr ows.Colour coding is the same for homologous cas genes.
Figur e 3. CRISPR arra ys and the corresponding identical spacers encoded in C. difficile isolates examined in this study.Identical sequences between spacers and phage sequences are indicated by matching colours; each colour represents the similarity with a particular C. difficile phage (see legend).Coloured spacers with numbers correspond to multiple of protospacers that match C. difficile phages and plasmids sequences as shown in Supplementary TableS6.( refers to the spacers that match to plasmid and C. difficile phage sequences, the numbers refer to the number of matched plasmid and C. difficile phage sequences, refers to the spacers that match to C. difficile phage sequences, the numbers refer to the number of matched phages, and refers to spacers that target multiple plasmid sequences, the numbers refer to the number of matched plasmids).White bo xes re present spacers that have no homology to any C. difficile phages.
Ta ble 1. R T designation and toxin gene carriage of isolates examined in this study.

Table 2 .
Characteristic of allelic profiles (STs) of the 11 strains of C. difficile isolated in this study, and ST/RT associations.